Date: Fri May 8 04:34:05 2020
Scientist: Rasika Hudlikar
Sequencing (Waksman): Dibyendu Kumar
Statistics: Davit Sargsyan
Principal Investigator: Ah-Ng Kong
# Taxonomic Ranks:
# **K**ing **P**hillip **C**an n**O**t **F**ind **G**reen **S**ocks
# * Kingdom
# * Phylum
# * Class
# * Order
# * Family
# * Genus
# * Species
# options(stringsAsFactors = FALSE,
# scipen = 999)
# # Increase mmemory size to 64 Gb----
# invisible(utils::memory.limit(65536))
# str(knitr::opts_chunk$get())
# # NOTE: the below does not work!
# knitr::opts_chunk$set(echo = FALSE,
# message = FALSE,
# warning = FALSE,
# error = FALSE)
# require(knitr)
# require(kableExtra)
require(phyloseq)
Loading required package: phyloseq
# require(shiny)
require(data.table)
Loading required package: data.table
data.table 1.12.2 using 18 threads (see ?getDTthreads). Latest news: r-datatable.com
require(ggplot2)
Loading required package: ggplot2
require(plotly)
Loading required package: plotly
Attaching package: ‘plotly’
The following object is masked from ‘package:ggplot2’:
last_plot
The following object is masked from ‘package:stats’:
filter
The following object is masked from ‘package:graphics’:
layout
require(DT)
Loading required package: DT
require(lmerTest)
Loading required package: lmerTest
Loading required package: lme4
Loading required package: Matrix
Attaching package: ‘lmerTest’
The following object is masked from ‘package:lme4’:
lmer
The following object is masked from ‘package:stats’:
step
require(taxize)
Loading required package: taxize
source("source/functions_may2019.R")
# On Windows set multithread=FALSE----
mt <- TRUE
A/J inbred mice from the Jackson Laboratory are used to model cancer and for carcinogen testing given their high susceptibility to carcinogen-induced tumors.
From Cayman Chemical:Diallyl sulfide is a thioether found in garlic that can modulate the cytochrome P450 drug metabolizing system, activate the constitutive androstane receptor to regulate multidrug resistance-associated proteins, and upregulate the expression of detoxifying enzymes. Garlic-derived organosulfides such as diallyl sulfide have been shown to be highly protective from chemically-induced carcinogenesis in animals.
Nicotine-derived nitrosamine ketone (NNK) is one of the key tobacco-specific nitrosamines derived from nicotine. It plays an important role in carcinogenesis.
Nine (9) A/J mice were used in this study, 3 in each of the 3 treatment groups: Vehicle control (VC), NNK control and NNK+DAS. The study design is as follows:
DAS in corn oil was force-fed to mice throghout the study (weeks 0 to 6). A single dose of 24uM/0.1mL NNK suspended in glyceryl trioctate vehicle was injected at Week 1 intraperetoneally (IP). The vehicvle control group received vehicle only. Fecal samples were collected at 4 timepoints: before DAS treatment began (Week 0), and 1, 2 and 4 weeks after pretreatment (Week 1, Week 2 and Week 4 respectively).
FastQ files were downloaded from Rutgers Box. A total of 72 files (2 per sample, pair-ended) were downloaded.
This script (nrf2ubiome_dada2_jan2020_v1.Rmd) was developed using DADA2 Pipeline Tutorial (1.12) with tips and tricks from the University of Maryland Shool of Medicine Institute for Genome Sciences (IGS) Microbiome Analysis Workshop (April 8-11, 2019). The output of the DADA2 script (data_jan2020/ps_jan2020.RData) is explored in this document.
# Load data----
# Counts
load("data_jan2020/ps_jan2020.RData")
ps_jan2020@sam_data
Sample Data: [36 samples by 4 sample variables]:
# Taxonomy
load("data_jan2020/taxa.RData")
taxa <- data.table(seq16s = rownames(taxa),
taxa)
head(taxa)
ps_jan2020@sam_data$Treatment <- factor(ps_jan2020@sam_data$Treatment,
levels = c("NNK control",
"Vehicle control (VC)",
"NNK+DAS"))
ps_jan2020@sam_data$ID <- factor(ps_jan2020@sam_data$ID,
levels = unique(ps_jan2020@sam_data$ID))
ps_jan2020@sam_data$Week <- factor(ps_jan2020@sam_data$Week,
levels = c("Week0",
"Week1",
"Week2",
"Week4"))
samples <- ps_jan2020@sam_data
datatable(samples,
rownames = FALSE,
options = list(pageLength = nrow(samples)))
The OTUs were mapped to Bacteria (98.45%) and Eukaryota (1.30%) kingdoms, and 16 OTUs (0.26%) undefined.
The total of 6,174 unique sequences were found. Out of those, 6,078 were mapped to bacterial genomes.
dim(ps_jan2020@otu_table@.Data)
[1] 36 6174
# Remove OTU not mapped to Bacteria
ps0 <- subset_taxa(ps_jan2020,
Kingdom == "Bacteria")
dim(ps0@otu_table@.Data)
[1] 36 6078
p1 <- ggplot(smpl,
aes(x = Sample,
y = Total,
fill = Treatment)) +
facet_wrap(~ Week,
scale = "free_x") +
geom_bar(stat = "identity",
color = "black") +
scale_x_discrete("") +
scale_y_continuous("Number of Reads") +
scale_fill_grey("Treatment",
start = 0.1,
end = 1,
na.value = "red",
aesthetics = "fill") +
theme_bw() +
theme(panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.title.x = element_blank(),
# axis.text.x = element_blank(),
axis.text.x = element_text(angle = 45,
hjust = 1),
# axis.ticks.x=element_blank(),
legend.position = "top")
tiff(filename = "tmp/seq_depth_jan20.tiff",
height =6,
width = 6,
units = "in",
res = 600,
compression = "lzw+p")
print(p1)
graphics.off()
print(p1)
Out of the 6,078 OTUs 6,044 belonged to 13 Phylum. 34 of the OTUs (or 0.56% of bacterial OTUs) could not be mapped to a phylum.
t2 <- data.table(table(tax_table(ps0)[, "Phylum"],
exclude = NULL))
t2$V1[is.na(t2$V1)] <- "Unknown"
setorder(t2, -N)
t2[, pct := N/sum(N)]
setorder(t2, -N)
colnames(t2) <- c("Phylum",
"Number of OTUs",
"Percent of OTUs")
datatable(t2,
rownames = FALSE,
caption = "Number of Bacterial OTUs by Phylum",
class = "cell-border stripe",
options = list(search = FALSE,
pageLength = nrow(t2))) %>%
formatCurrency(columns = 2,
currency = "",
mark = ",",
digits = 0) %>%
formatPercentage(columns = 3,
digits = 2)
Remove:
1. Unmapped OTUs (“Unknown”).
2. Cyanobacteria: aerobic, photosynthesizing bacteria that probably got into the sample through food.
NOTE: Chloroflexi might be ok.
ps0 <- subset_taxa(ps0,
(!(Phylum %in% c("Unknown",
"Cyanobacteria"))))
Shannon index (aka Shannon enthrophy) is calculated as:
H’ = -sum(1 to R)p(i)ln(p(i)) When there is exactly 1 type of data (e.g. a single species in the sample), H’=0. The opposite scenario is when there are R>1 species present in the sample in the exact same amounts and H’=ln(R).
Shannon’s diversity index was calculated for each sample and ploted over time.
shannon.ndx <- estimate_richness(ps0,
measures = "Shannon")
shannon.ndx <- data.table(Sample = rownames(shannon.ndx),
shannon.ndx)
smpl <- merge(smpl,
shannon.ndx,
by = "Sample")
p1 <- ggplot(smpl,
aes(x = Total,
y = Shannon,
fill = Treatment,
shape = Week)) +
geom_point(size = 2) +
scale_shape_manual(breaks = unique(smpl$Week),
values = 21:24)
tiff(filename = "tmp/shannon_vs_depth_jan20.tiff",
height = 5,
width = 6,
units = "in",
res = 600,
compression = "lzw+p")
print(p1)
graphics.off()
ggplotly(p1)
Even though estimate_richness function does not adjust for the sequencing depth, there is no correlation between the index and the sample’s sequecing depth. Proceed with the comparison.
p1 <- plot_richness(ps0,
x = "Week",
measures = "Shannon") +
facet_wrap(~ Treatment) +
geom_line(aes(group = ID),
color = "black") +
geom_point(aes(fill = Treatment),
shape = 21,
size = 3,
color = "black") +
scale_x_discrete("") +
theme(axis.text.x = element_text(angle = 30,
hjust = 1,
vjust = 1))
ggplotly(p = p1,
tooltip = c("ID",
"value"))
p1 <- p1 +
scale_fill_discrete("") +
theme(legend.position = "top")
tiff(filename = "tmp/shannon.tiff",
height = 4,
width = 5,
units = "in",
res = 600,
compression = "lzw+p")
print(p1)
graphics.off()
# Average shannon index by treatment group
tmp <- data.table(copy(smpl))
tmp[, mu := mean(Shannon),
by = list(Treatment,
Week)]
tmp[, sem := sd(Shannon)/sqrt(.N),
by = list(Treatment,
Week)]
tmp <- unique(tmp[, c("Treatment",
"Week",
"mu",
"sem")])
p1 <- ggplot(tmp,
aes(x = Week,
y = mu,
ymin = mu - sem,
ymax = mu + sem,
fill = Treatment,
group = Treatment)) +
# facet_wrap(~ Genotype,
# scale = "free_x") +
geom_errorbar(position = position_dodge(0.3),
width = 0.4) +
geom_line(position = position_dodge(0.3)) +
geom_point(size = 3,
shape = 21,
position = position_dodge(0.3)) +
scale_x_discrete("") +
scale_y_continuous("Shannon Index") +
scale_fill_grey("Treatment",
start = 0,
end = 1,
na.value = "red",
aesthetics = "fill") +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
# panel.border = element_blank(),
axis.title.x = element_blank(),
# axis.text.x = element_blank(),
axis.text.x = element_text(angle = 45,
hjust = 1),
axis.ticks.x=element_blank(),
legend.position = "top")
tiff(filename = "tmp/avg_shannon_jan20.tiff",
height = 4,
width = 5,
units = "in",
res = 600,
compression = "lzw+p")
print(p1)
graphics.off()
print(p1)
counts_p <- counts_by_tax_rank(dt1 = otu,
aggr_by = "Phylum")
fb <- t(counts_p[Phylum %in% c("Firmicutes",
"Bacteroidetes"), -1])
fb <- data.table(Sample = rownames(fb),
Firmicutes = fb[, 2],
Bacteroidetes = fb[, 1])
fb <- data.table(merge(smpl,
fb,
by = "Sample"))
lims <- log2(range(c(fb$Firmicutes,
fb$Bacteroidetes)))
p1 <- ggplot(fb,
aes(x = log2(Firmicutes),
y = log2(Bacteroidetes),
fill = Treatment)) +
geom_point(size = 2,
color = "black",
shape = 21) +
geom_abline(slope = 1,
intercept = 0,
linetype = "dashed") +
scale_x_continuous(limits = lims) +
scale_y_continuous(limits = lims) +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
p2 <- ggplot(fb,
aes(x = Week,
y = Firmicutes/Bacteroidetes,
fill = Treatment,
group = Treatment)) +
geom_hline(yintercept = 1,
linetype = "dashed") +
geom_point(size = 2,
color = "black",
shape = 21,
position = position_dodge(0.3)) +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_text(angle = 45,
hjust = 1),
legend.position = "none")
tiff(filename = "tmp/bact_vs_firm_jan20.tiff",
height = 7,
width = 5,
units = "in",
res = 600,
compression = "lzw+p")
gridExtra::grid.arrange(p1, p2)
graphics.off()
gridExtra::grid.arrange(p1, p2)
fb[, F_B := Firmicutes/Bacteroidetes]
fb[, log2_F_B := log2(F_B)]
m1 <- lm(log2_F_B ~ 0 + Week*Treatment,
data = fb)
s1 <- summary(m1)
ci1 <- confint(m1)
t1 <- data.table(Term = rownames(s1$coefficients),
Ratio = round(2^s1$coefficients[, 1], 3),
`95% C.I.L.L.` = round(2^ci1[, 1], 3),
`95% C.I.U.L.` = round(2^ci1[, 2], 3),
`p-Value` = round(s1$coefficients[, 4], 3),
Sign = "")
t1$Sign[t1$`p-Value` < 0.05] <- "*"
t1$Sign[t1$`p-Value` < 0.01] <- "**"
t1$`p-Value`[t1$`p-Value` < 0.001] <- "<0.001"
datatable(t1,
rownames = FALSE,
class = "cell-border stripe")
fb[, mu := mean(Firmicutes/Bacteroidetes),
by = c("Treatment",
"Week")]
fb[, sem := sd(Firmicutes/Bacteroidetes)/sqrt(.N),
by = c("Treatment",
"Week")]
mufb <- unique(fb[, c("Treatment",
"Week",
"mu",
"sem")])
p3 <- ggplot(mufb,
aes(x = Week,
y = mu,
ymin = mu - sem,
ymax = mu + sem,
fill = Treatment,
group = Treatment)) +
geom_hline(yintercept = 1,
linetype = "dashed") +
geom_errorbar(position = position_dodge(0.3),
width = 0.4) +
geom_line(position = position_dodge(0.3)) +
geom_point(size = 3,
shape = 21,
position = position_dodge(0.3)) +
scale_x_discrete("") +
scale_y_continuous("Firmicutes/Bacteroidetes") +
# scale_fill_grey("Treatment",
# start = 0,
# end = 1,
# na.value = "red",
# aesthetics = "fill") +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_text(angle = 45,
hjust = 1),
legend.position = "none")
tiff(filename = "tmp/avg_firm_bact_jan20.tiff",
height = 4,
width = 6,
units = "in",
res = 600,
compression = "lzw+p")
print(p3)
graphics.off()
print(p3)
mufb[, est := paste0(round(mu, 2),
"(",
round(sem, 2),
")")]
t1 <- dcast.data.table(mufb,
Treatment ~ Week,
value.var = "est")
datatable(t1,
rownames = FALSE,
class = "cell-border stripe",
caption = "Average Ratio and SD of Firmicutes to Bacteroidetes",
options = list(search = FALSE,
pageLength = nrow(t1)))
tiff(filename = "tmp/firm_vs_bact_jan20.tiff",
height = 7,
width = 5,
units = "in",
res = 600,
compression = "lzw+p")
gridExtra::grid.arrange(p1, p3)
graphics.off()
gridExtra::grid.arrange(p1, p3)
otu <- data.table(ps0@tax_table@.Data,
t(ps0@otu_table@.Data))
dim(otu)
[1] 6064 42
[1] "Range of relative abundance of Bacteroidetes and Firmicutes combined (%)"
[1] 74.8 99.6
dt_pca <- t(ra_p[, 2:ncol(ra_p)])
colnames(dt_pca) <- ra_p$Phylum
dt_pca_p <- data.table(Sample = rownames(dt_pca),
dt_pca)
dt_pca_p <- merge(smpl,
dt_pca_p,
by = "Sample")
# Keep only the phylum with non-zero counts
tmp <- dt_pca_p[, 7:ncol(dt_pca_p)]
keep_p <- colnames(tmp)[colSums(tmp) > 0]
dt_pca <- dt_pca[, keep_p]
# m1 <- prcomp(dt_pca,
# center = TRUE,
# scale. = TRUE)
# m1 <- prcomp(dt_pca,
# center = FALSE,
# scale. = FALSE)
m1 <- prcomp(dt_pca,
center = TRUE,
scale. = FALSE)
summary(m1)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9
Standard deviation 0.1239 0.04443 0.003799 0.001111 0.0008556 0.0002046 4.002e-05 2.329e-05 2.002e-05
Proportion of Variance 0.8852 0.11388 0.000830 0.000070 0.0000400 0.0000000 0.000e+00 0.000e+00 0.000e+00
Cumulative Proportion 0.8852 0.99905 0.999880 0.999960 1.0000000 1.0000000 1.000e+00 1.000e+00 1.000e+00
PC10 PC11
Standard deviation 1.434e-05 3.832e-06
Proportion of Variance 0.000e+00 0.000e+00
Cumulative Proportion 1.000e+00 1.000e+00
# Select PC-s to pliot (PC1 & PC2)
choices <- 1:2
# Add meta data
dt.scr <- data.table(m1$x[, choices])
dt.scr$Sample <- rownames(m1$x)
dt.scr <- merge(smpl,
dt.scr,
by = "Sample")
dt.scr
# Loadings, i.e. arrows (df.v)
dt.rot <- as.data.frame(m1$rotation[, choices])
dt.rot$feat <- rownames(dt.rot)
dt.rot <- data.table(dt.rot)
dt.rot
dt.load <- melt.data.table(dt.rot,
id.vars = "feat",
measure.vars = 1:2,
variable.name = "pc",
value.name = "loading")
dt.load$feat <- factor(dt.load$feat,
levels = unique(dt.load$feat))
# Plot loadings
p0 <- ggplot(data = dt.load,
aes(x = feat,
y = loading)) +
facet_wrap(~ pc,
nrow = 2) +
geom_bar(stat = "identity") +
ggtitle("PC Loadings") +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45,
hjust = 1))
p0
tiff(filename = "tmp/pc.1.2_loadings_phylum.tiff",
height = 5,
width = 6,
units = 'in',
res = 300,
compression = "lzw+p")
print(p0)
graphics.off()
print(p0)
# Axis labels
u.axis.labs <- paste(colnames(dt.rot)[1:2],
sprintf('(%0.1f%% explained var.)',
100*m1$sdev[choices]^2/sum(m1$sdev^2)))
u.axis.labs
[1] "PC1 (88.5% explained var.)" "PC2 (11.4% explained var.)"
cntr <- data.table(aggregate(x = dt.scr$PC1,
by = list(dt.scr$Treatment,
dt.scr$Week),
FUN = "mean"),
aggregate(x = dt.scr$PC2,
by = list(dt.scr$Treatment,
dt.scr$Week),
FUN = "mean")$x)
colnames(cntr) <- c("Treatment",
"Week",
"PC1",
"PC2")
cntr$tmp <- factor(cntr$Treatment,
levels = c("NNK control",
"Vehicle control (VC)",
"NNK+DAS"),
labels = c("NNK",
"VC",
"NNK+DAS"))
cntr$grp <- paste(cntr$tmp,
cntr$Week,
sep = "_")
# Based on Figure p0, keep only a few variables with high loadings in PC1 and PC2----
dt.rot[, rating:= (PC1)^2 + (PC2)^2]
setorder(dt.rot, -rating)
# Select top 3
dt.rot <- dt.rot[1:3, ]
# var.keep.ndx <- which(dt.rot$feat %in% c(...))
# Or select all
# var.keep.ndx <- 3:ncol(dt1)
# Use dt.rot[var.keep.ndx,] and dt.rot$feat[var.keep.ndx]
p1 <- ggplot(data = dt.rot,
aes(x = PC1,
y = PC2)) +
# coord_equal() +
geom_point(data = dt.scr,
aes(fill = Treatment,
shape = factor(Week)),
size = 3,
alpha = 0.5) +
geom_segment(aes(x = 0,
y = 0,
xend = 0.2*PC1,
yend = 0.2*PC2),
arrow = arrow(length = unit(1/2, 'picas')),
# size = 1,
color = "black") +
geom_text(aes(x = 0.22*PC1,
y = 0.22*PC2,
label = dt.rot$feat),
# size = 5,
hjust = 0.5) +
scale_x_continuous(u.axis.labs[1]) +
scale_y_continuous(u.axis.labs[2]) +
scale_fill_manual(name = "Treatment",
breaks = c("NNK control",
"Vehicle control (VC)",
"NNK+DAS"),
values = c("red",
"blue",
"green")) +
scale_shape_manual(breaks = 1:4,
values = 21:24) +
geom_label(data = cntr,
aes(x = PC1,
y = PC2,
label = grp,
colour = Treatment),
alpha = 0.5,
size = 3) +
scale_color_manual(guide = FALSE,
breaks = c("NNK control",
"Vehicle control (VC)",
"NNK+DAS"),
values = c("red",
"blue",
"green")) +
ggtitle("") +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "none")
tiff(filename = "tmp/phylum_biplot_trt_jan20.tiff",
height = 7,
width = 7,
units = 'in',
res = 300,
compression = "lzw+p")
print(p1)
graphics.off()
ggplotly(p1)
geom_GeomLabel() has yet to be implemented in plotly.
If you'd like to see this geom implemented,
Please open an issue with your example code at
https://github.com/ropensci/plotly/issuesgeom_GeomLabel() has yet to be implemented in plotly.
If you'd like to see this geom implemented,
Please open an issue with your example code at
https://github.com/ropensci/plotly/issuesgeom_GeomLabel() has yet to be implemented in plotly.
If you'd like to see this geom implemented,
Please open an issue with your example code at
https://github.com/ropensci/plotly/issues
# Generic biplot
biplot(m1)
specieslist <- c("Streptococcus bovis",
"Helicobacter pylori",
"Fusobacterium nucleatum",
"Enterococcus faecalis",
"Lactobacillus acidophilus",
"Bifidobacterium longum")
t1 <- tax_name(query = c(specieslist),
get = c("phylum",
"class",
"order",
"family",
"genus"),
db = "ncbi")
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
[34m══[39m 1 queries [34m═══════════════[39m
Retrieving data for taxon 'Streptococcus bovis'
[32m✔ Found: [39m Streptococcus+bovis
[38;5;249m══[39m Results [38;5;249m═════════════════[39m
● Total: [32m1[39m
● Found: [32m1[39m
● Not Found: [32m0[39m
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
[34m══[39m 1 queries [34m═══════════════[39m
Retrieving data for taxon 'Helicobacter pylori'
[32m✔ Found: [39m Helicobacter+pylori
[38;5;249m══[39m Results [38;5;249m═════════════════[39m
● Total: [32m1[39m
● Found: [32m1[39m
● Not Found: [32m0[39m
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
[34m══[39m 1 queries [34m═══════════════[39m
Retrieving data for taxon 'Fusobacterium nucleatum'
[32m✔ Found: [39m Fusobacterium+nucleatum
[38;5;249m══[39m Results [38;5;249m═════════════════[39m
● Total: [32m1[39m
● Found: [32m1[39m
● Not Found: [32m0[39m
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
[34m══[39m 1 queries [34m═══════════════[39m
Retrieving data for taxon 'Enterococcus faecalis'
[32m✔ Found: [39m Enterococcus+faecalis
[38;5;249m══[39m Results [38;5;249m═════════════════[39m
● Total: [32m1[39m
● Found: [32m1[39m
● Not Found: [32m0[39m
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
[34m══[39m 1 queries [34m═══════════════[39m
Retrieving data for taxon 'Lactobacillus acidophilus'
[32m✔ Found: [39m Lactobacillus+acidophilus
[38;5;249m══[39m Results [38;5;249m═════════════════[39m
● Total: [32m1[39m
● Found: [32m1[39m
● Not Found: [32m0[39m
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
[34m══[39m 1 queries [34m═══════════════[39m
Retrieving data for taxon 'Bifidobacterium longum'
[32m✔ Found: [39m Bifidobacterium+longum
[38;5;249m══[39m Results [38;5;249m═════════════════[39m
● Total: [32m1[39m
● Found: [32m1[39m
● Not Found: [32m0[39m
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
# Genus these species belong to, found in this dataset
t1.1 <- unique(taxa[Genus %in% t1$genus, -1])
t1.1
# Count number of OTUs in each of these genus (in this data)
find_genus <- unique(otu[Genus %in% t1.1$Genus, ])
find_genus
tbl1 <- data.table(table(find_genus$Genus))
colnames(tbl1) <- c("Genus",
"N_OTU")
# Number of species in each of these genus (in NCBI reference database)
rm(t2.1)
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 4635595 247.6 7344555 392.3 7344555 392.3
Vcells 8898860 67.9 17824054 136.0 14784094 112.8
# NOTE: this does not work every time - KEEP TRYING until it does!
t2.1 <- downstream(x = t1$genus,
downto = "species",
db = "ncbi")
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
[34m══[39m 6 queries [34m═══════════════[39m
Retrieving data for taxon 'Streptococcus'
[32m✔ Found: [39m Streptococcus
Retrieving data for taxon 'Helicobacter'
[32m✔ Found: [39m Helicobacter
Retrieving data for taxon 'Fusobacterium'
[32m✔ Found: [39m Fusobacterium
Retrieving data for taxon 'Enterococcus'
[32m✔ Found: [39m Enterococcus
Retrieving data for taxon 'Lactobacillus'
[32m✔ Found: [39m Lactobacillus
Retrieving data for taxon 'Bifidobacterium'
[32m✔ Found: [39m Bifidobacterium
[38;5;249m══[39m Results [38;5;249m═════════════════[39m
● Total: [32m6[39m
● Found: [32m6[39m
● Not Found: [32m0[39m
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
No ENTREZ API key provided
Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
tbl2 <- data.table(Genus = names(t2.1),
N_Species = lapply(t2.1, nrow))
# Merge number of OTUs found and number of species kown for each genus
tbl <- merge(tbl1,
tbl2,
by = "Genus",
all = TRUE)
tbl$N_OTU[is.na(tbl$N_OTU)] <- 0
setorder(tbl, -N_OTU)
datatable(tbl,
rownames = FALSE)
# Meand and range relative abundance of each genus
tmp <- ra_g[Genus %in% t1$genus, ]
tmp1 <- apply(tmp[, -1],
MARGIN = 1,
FUN = function(a) {
return(data.table(Mean = mean(a),
SD = sd(a),
Min = min(a),
Max = max(a)))
})
out <- data.table(Genus = tmp$Genus,
rbindlist(tmp1))
datatable(out,
rownames = FALSE) %>%
formatPercentage(columns = 2:4,
digits = 2)
sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.5 (Maipo)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] taxize_0.9.95 lmerTest_3.1-0 lme4_1.1-21 Matrix_1.2-14 DT_0.6 plotly_4.9.0
[7] ggplot2_3.2.0 data.table_1.12.2 phyloseq_1.26.1
loaded via a namespace (and not attached):
[1] nlme_3.1-137 bold_1.0.0 httr_1.4.0 numDeriv_2016.8-1 tools_3.5.0
[6] R6_2.4.0 vegan_2.5-5 lazyeval_0.2.2 BiocGenerics_0.28.0 mgcv_1.8-23
[11] colorspace_1.4-1 permute_0.9-5 ade4_1.7-13 withr_2.1.2 tidyselect_0.2.5
[16] gridExtra_2.3 curl_3.3 compiler_3.5.0 cli_1.1.0 Biobase_2.42.0
[21] xml2_1.2.0 labeling_0.3 triebeard_0.3.0 scales_1.1.0 stringr_1.4.0
[26] digest_0.6.19 minqa_1.2.4 XVector_0.22.0 pkgconfig_2.0.2 htmltools_0.3.6
[31] htmlwidgets_1.3 rlang_0.4.0 rstudioapi_0.10 httpcode_0.3.0 shiny_1.3.2
[36] farver_2.0.3 zoo_1.8-5 jsonlite_1.6 crosstalk_1.0.0 dplyr_0.8.1
[41] magrittr_1.5 biomformat_1.10.1 Rcpp_1.0.1 munsell_0.5.0 S4Vectors_0.20.1
[46] Rhdf5lib_1.4.3 ape_5.3 lifecycle_0.1.0 stringi_1.4.3 yaml_2.2.0
[51] MASS_7.3-49 zlibbioc_1.28.0 rhdf5_2.26.2 plyr_1.8.4 grid_3.5.0
[56] parallel_3.5.0 promises_1.0.1 crayon_1.3.4 lattice_0.20-35 Biostrings_2.50.2
[61] splines_3.5.0 multtest_2.38.0 knitr_1.23 pillar_1.4.0 igraph_1.2.4.1
[66] boot_1.3-20 reshape2_1.4.3 codetools_0.2-15 stats4_3.5.0 crul_0.9.0
[71] glue_1.3.1 nloptr_1.2.1 httpuv_1.5.1 urltools_1.7.3 foreach_1.4.4
[76] gtable_0.3.0 purrr_0.3.2 tidyr_0.8.3 reshape_0.8.8 assertthat_0.2.1
[81] xfun_0.7 mime_0.6 xtable_1.8-4 later_0.8.0 survival_2.41-3
[86] viridisLite_0.3.0 tibble_2.1.3 iterators_1.0.10 IRanges_2.16.0 cluster_2.0.7-1